An Improved Graph Model for Chinese Spell Checking

نویسندگان

  • Yang Xin
  • Hai Zhao
  • Yuzhu Wang
  • Zhongye Jia
چکیده

In this paper, we propose an improved graph model for Chinese spell checking. The model is based on a graph model for generic errors and two independentlytrained models for specific errors. First, a graph model represents a Chinese sentence and a modified single source shortest path algorithm is performed on the graph to detect and correct generic spelling errors. Then, we utilize conditional random fields to solve two specific kinds of common errors: the confusion of “在” (at) (pinyin is ‘zai’ in Chinese), “再” (again, more, then) (pinyin: zai) and “的” (of) (pinyin: de), “地” (-ly, adverb-forming particle) (pinyin: de), “得” (so that, have to) (pinyin: de). Finally, a rule based system is exploited to solve the pronoun usage confusions: “她” (she) (pinyin: ta), “他” (he) (pinyin: ta) and some others fixed collocation errors. The proposed model is evaluated on the standard data set released by the SIGHAN Bake-off 2014 shared task, and gives competitive result. ∗This work was partially supported by the National Natural Science Foundation of China (No. 60903119, No. 61170114, and No. 61272248), the National Basic Research Program of China (No. 2013CB329401), the Science and Technology Commission of Shanghai Municipality (No. 13511500200), the European Union Seventh Framework Program (No. 247619), the Cai Yuanpei Program (CSC fund 201304490199 and 201304490171), and the art and science interdiscipline funds of Shanghai Jiao Tong University (A study on mobilization mechanism and alerting threshold setting for online community, and media image and psychology evaluation: a computational intelligence approach). †Corresponding author.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Graph Model for Chinese Spell Checking

This paper describes our system in the Bake-Off 2013 task of SIGHAN 7. We illustrate that Chinese spell checking and correction can be efficiently tackled with by utilizing word segmenter. A graph model is used to represent the sentence and a single source shortest path (SSSP) algorithm is performed on the graph to correct spell errors. Our system achieves 4 first ranks out of 10 metrics on the...

متن کامل

Integrating Dictionary and Web N-grams for Chinese Spell Checking

Chinese spell checking is an important component of many NLP applications, including word processors, search engines, and automatic essay rating. Nevertheless, compared to spell checkers for alphabetical languages (e.g., English or French), Chinese spell checkers are more difficult to develop because there are no word boundaries in the Chinese writing system and errors may be caused by various ...

متن کامل

A Hybrid Meta-heuristic Approach to Cope with State Space Explosion in Model Checking Technique for Deadlock Freeness

Model checking is an automatic technique for software verification through which all reachable states are generated from an initial state to finding errors and desirable patterns. In the model checking approach, the behavior and structure of system should be modeled. Graph transformation system is a graphical formal modeling language to specify and model the system. However, modeling of large s...

متن کامل

Chinese Spell Checking Based on Noisy Channel Model

Chinese spell checking is an important component of many NLP applications, including word processors, search engines, and automatic essay rating. Compared to English, Chinese has no word boundaries and there are various Chinese input methods that cause different kinds of typos, so it is more difficult to develop spell checkers for Chinese. In this paper, we introduce a novel method for correcti...

متن کامل

Khmer Spell Checker

Khmer is the official language of Cambodia. It is a complex language. Similar to Chinese, Japanese and Thai, Khmer words are written without spaces or other word delimiters. This is a major challenge in spell checking Khmer since there is no simple way to determine word boundaries. However, it is feasible to spell check Khmer. The process of spell checking Khmer is different from the spell chec...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014